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I " Abstract 

o '■ 

. In recent years, a number of results have been developed which connect information measures and 

estimation measures under various models, including, predominently, Gaussian and Poisson models. More 
recent results due to Taborda and Perez-Cruz relate the relative entropy to certain mismatched estimation 
errors in the context of binomial and negative binomial models, where, unlike in the case of Gaussian and 
Poisson models, the conditional mean estimates concern models of different parameters than those of the 
original model. In this note, a different set of results in simple forms are developed for binomial and negative 
binomial models, where the conditional mean estimates are produced through the original models. The new 
f-H ■ results are more consistent with existing results for Gaussian and Poisson models. 
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I. Introduction 



Since a simple differential relationship between the mutual information and the minimum mean 
square error over a scalar Gaussian model was discovered [1], a number of similar results have been 
developed for several other models, e.g., Poisson models 0, (3). In the context of Gaussian and 
Poisson models, it has also been found that the relative entropy can be expressed as the integral of 
^ ! the increase of the estimation error due to mismatched prior distribution 0)-||5]|. 

More recently, Taborda and Perez-Cruz [|6|| developed results in the context of binomial and 
negative binomial models. The key result expresses the derivative of the relative entropy of two 
output distributions of the same binomial (or negative binomial) model induced by two different 
inputs in terms of certain mismatched estimation errors. As in the case of Gaussian and Poisson 
models, the errors concern conditional mean estimates of the input given the output. However, the 
conditional mean is produced using binomial (or negative binomial) models with modified parameters. 
In particular, in case of the binomial model, the modified model is of one fewer trial than the original 
model; in case of the negative model, the modified model is of one more failure than the original 
model. 

In this note, we develop a different set of results concerning essentially the same binomial and 
negative binomial models as in @. The results put the derivative of the relative entropy in a simple 
form concerning some average difference of conditional mean estimates due to mismatched prior 
distribution. In contrast to results of H, the conditional mean estimates here are based on the 
original binomial and negative binomial models. The results are thus more consistent with existing 
results for Gaussian and Poisson models. 



2 



II. The Binomial Model 

The binomial model is based on the binomial distribution, Binomial(n, q), which describes the 
probability of having k successful trials in n independent Bernoulli trials, each with probability q to 
succeed: 

77,4 kf-i „\n-k 



P(Y — k) — I 1 q«(l - q) n ~\ k = 0, . . . , n. (1) 

With some hindsight, we let the binomial model be a random transformation from a random variable 
X which takes its value on (a, oo) to another random variable Y, where, conditioned on X = x, Y 
follows the distribution of Binomial(n, a/x). The conditional probability mass function is given by 

/in f n \ fa\ y ( a\ n -y 
Pv\x(y\x)= - 1-- , = 0,. ..,n. (2) 



y j \xJ \ x 

The variables X and Y are viewed as the input and output of the binomial model. Here the input X 
controls the probability of success of an individual Bernoulli trial, namely, for fixed X, the success- 
to-failure ratio is a : X. The larger X is, the fewer trials succeed on average. The parameter a is 
viewed as a scaling of the input. 

If the prior distribution of the input X is Px, the corresponding output distribution is denoted by 
P Y \ if the prior distribution of X is Q x , the corresponding output distribution is denoted by Qy. 
Throughout this note, we use E{-} and E{-| ■} to denote expectation and conditional expectation 
under distribution P, whereas we use Eq{-} and Eq{ - | •} to denote expectation and conditional 
expectation under distribution Q. Thus the conditional mean of X given Y is denoted by E {X \ Y} 
under distribution P and by Eg {X \ Y} under distribution Q. 

We also define the following function 

g(t)=t -1-logt. (3) 

This function is convex on (0, oo), and achieves its unique minimum, 0, at t = 1. For two positive 
numbers, x and x, the function g(x/x) can be viewed as a measure of their difference, in the sense 
that it is always nonnegative, and that it is equal to if and only if x = x. Moreover, g(x/x) increases 
monotonically as i/i departs from 1 in either direction of the axis. 

Theorem 1. Let Py and Qy be the output distribution of the binomial model © induced by input 
distributions Px and Qx, respectively, where Px and Qx put no probability mass on (— oo, a]. Then 

Lemma 1. Let Py be the probability mass function of the output of the binomial model described 
by ©, where the input follows distribution P x with zero mass on (— oo, a). For every y — 0, . . . , n, 

^-P Y (y) = - (yP Y (y) - (y + l)P Y (y + l)) (5) 

da a 

where we use the convention that py(n+ 1) =0. The result remains true if Py is replaced by Qy 
in ©. 



Proof: We start with 

P,Jv) = Fil , , , , , 

K yJ \XJ V X 



iv(*)=E<r ^yri-^rv (6) 
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Evidently, 



da \\y J da \ \XJ V X 

- ! e { (:) © s (* - ir} - ^ { (;) © 9+1 (> - 1)" - * -1 } « 

We note that ©-(HI) hold for y = 0, . . . , n. In arriving at (flOl) . we use © and the convention that 
( n +i) = 0. In fact the second term in © and the second term in (flOl ) are both equal to for y = n. 
Using © again, we arrive at © from (fTOl) . 

Since © holds for any input distribution P x , it remains true if P x is replaced by another 
distribution Q x , as long as the input is always greater than a. ■ 

Lemma Q] resembles a result for Gaussian models in HI, where the derivative with respect to the 
scaling parameter translates to the derivative with respect to the output variable. For the binomial 
model, the output is discrete and the result consists of the difference of the output distribution 
(modulated by the variable y) in lieu of derivative. 

We next prove Theorem [Q 
Proof of Theorem [7J- From 

d (PyWQy) = fl My) log ^f- v (ii) 

it is not difficult to show that 

= a - 1 (A-B) (13) 

where 

= E ( lo s ^|y) (vMv) -(y + 1 Wy + 1)) (15) 

^(io^)^ W -g(io g gM)(, + 1 )p,(, + 1 )) oa> 

d / m p Y(y)QY(y - l) 

= V y-Py (y log — — (18) 

pi Py(y-i)Qy(y) 
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and 



Moreover, 



Similarly, 



Therefore, 



y=0 



P Y {y) dQ Y (y) 
Qy{y) da 



E544(?/'- ' ; 1 )Q) ill -!- 1 )) 



y=0 



Qy(y) 



n-l 



E ^(y) - E 7Tfil)(y + + *)) 

!/=i !/=o ^ y ^' 



2/=l 



*V(y-i) 



n 

y-i 

y 



ay- 1 / a \ J 

x/ V X 



a 



-l 



a \ / fM f a\ y 



n-y + l \XJ \ X) \yj \X 

y 



a \ n ~y 



X 



■El — — 1 

n — y + 1 I a 



n — y + 1 



^ = 2/^y(j/)- 



iV(y-i)gy(y) _ E{x|y- = 2/ }-a 



P Y (y)Q Y (y - 1) E Q {X| K = y}-a 
Plugging (|28l) into (fT8~l) and (1231) and subsequently (fT3l . we have 

-d (JVHW = a- 1 E yMy)(T(y) - 1 - io g r(y)) 



da 



j/=i 



where is a shorthand for the function defined as the RHS of (T28T) . By definition ©, 
established (HJ) in Theorem [Q 



(19) 
(20) 

(21) 
(22) 
(23) 

(24) 
(25) 
(26) 

(27) 

(28) 

(29) 
we have 
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P(Y = y) 



[l-q) r q y , y = 0,l, 



(30) 



III. The Negative Binomial Model 
The negative binomial distribution is defined by the following probability mass function 

'y + r — Y 

y 

which is the probability that y successful trials are seen before the r-th failure is observed, where the 
trials are independent Bernoulli trials each with probability q to succeed. We denote this distribution 
as — Binomial(r, q). 

With some hindsight, we define a negative binomial model based on random transformation 
from random variable X to random variable Y, where, conditioned on X = x, Y has distribution 
— Binomial(r, b/(b + x)). That is, the random transformation is given by conditional probability mass 
function 



Py\x(v\x) 



y + r — 1 

y 



X 



b + x 



b + x 



y = o,i, 



(3D 



Theorem 2. Let Py and Qy be the output distribution of the binomial model (1311) induced by input 
distributions Px and Qx, respectively, where Px and Qx put no probability mass on (— oo, 0]. Then 

E{x|r} + b 



Y 

E ^J' 9 \E Q {X\Y} + b 



(32) 



Lemma 2. Let Py be the probability mass function of the output of the negative binomial model 
described by ((311) . where the input is always positive and follows distribution Px- For every y = 



0,1, 



d P Y (y) = \ (yP Y (y) - (y + l)P Y {y + 1)) 



db~ 1 w/ b 

The result remains true if Py is replaced by Qy in (|33 
Proof: We start with 



(33) 



Evidently, 



y + r — 1 

y 

y + r — 1 

y 

y + r — 1 

y 

y + r — 1 
1/ 

y + 1 



d 
d6 



y + r — 1 



X 



X 



b + X 



b + X 



b + X 
X 



b + X 



\db \b + X 
X 



b + X 

X 
b + X 



X 



X 



y + r 
y + 1 



b + X 
b 

b + X 

X 
b + X 



b + X 

yx 



d 
db 



b + X 



b + X b{b + X) 
y _ y + r 
b b + X 

y+1' 



b + X 



-(yP Y (y)-(y + l)Py(y + l)). 



(34) 

(35) 
(36) 
(37) 
(38) 

(39) 
(40) 
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Since (1331 holds for any input distribution P x , it remains true if Px is replaced by another 
distribution Qx, as long as the input is always nonnegative. ■ 

It is interesting to see that (1331) is literally identical to © if the two parameters a and b are 
identical. 

The proof of Theorem |2] based on Lemma [2] resembles that of Theorem [Q 

Proof of Theorem |2]- Using similar techniques as in the proof of Theorem |2l we arrive at 



where 



Moreover, 



Similarly, 



Therefore, 



1 -.CO 

-D (PyWQy) = - h Y,yMy)(ny) - 1 - log T(y)) 



y=l 



Py(y - 1) 



T{y) 



y + r — 2 

y-i 
y _ 

y + r — 1 

y 



P Y (y - l)Q Y (y) 
Py(y)Q Y (y-iy 



X 



b + X 



b + X 



y + r — 1 



E<^ 1 



b 
X 
J 



X (y + r - 1 

y 



2/-1 



X 



b + X 



b + X 



Y = y\P Y {y). 



Qy{y-1) 



y 



i + 



x 



y + r — 1 

E{X\Y = y} + b 



Y = y}Q Y (y). 



T{y) 



E Q {X\Y = y} + b- 
Theorem [2] is thus established using ©, (|4T1) and (l47l) . 



(41) 
(42) 

(43) 

(44) 
(45) 

(46) 

(47) 
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